Nowadays, the huge amount of information distributed through the Webmotivates studying techniques to be adopted in order to extract relevant datain an efficient and reliable way. Both academia and enterprises developedseveral approaches of Web data extraction, for example using techniques ofartificial intelligence or machine learning. Some commonly adopted procedures,namely wrappers, ensure a high degree of precision of information extractedfrom Web pages, and, at the same time, have to prove robustness in order not tocompromise quality and reliability of data themselves. In this paper we focuson some experimental aspects related to the robustness of the data extractionprocess and the possibility of automatically adapting wrappers. We discuss theimplementation of algorithms for finding similarities between two differentversion of a Web page, in order to handle modifications, avoiding the failureof data extraction tasks and ensuring reliability of information extracted. Ourpurpose is to evaluate performances, advantages and draw-backs of our novelsystem of automatic wrapper adaptation.
展开▼